Learn how to benchmark embedding models on your own data in this course for beginners.
In this course, you will learn:
- The limitations of extracting text from PDF files with Python libraries and to solve that with the help of VLMs (Vision Language Models).
- How to divide the extracted text into chunks that preserve context.
- Generation questions for each chunk using LLMs (Large Language Models).
- Use embedding models to create vector representations of the chunks and questions.
- Use both open source and proprietary embedding models.
- Use llama.cpp to run models in the GGUF format locally on your machine.
- Perform the benchmarking of different embedding models using various metrics and statistical tests with the help of ranx.
- Plot the vector representations to visualize if clusters are being formed.
- Understand how to interpret the p-value that a statistical test provides.
- And much more!
You can find the slides, notebook, and scripts in this GitHub repository:
The dataset is available here:
To connect with Imad Saddik, check out his social accounts:
LinkedIn:
YouTube:
Website:
⭐️ Course Contents ⭐️
(0:00:00) About the course
(0:06:05) Introduction
(0:17:58) Extracting text from PDF documents
(1:01:08) Divide text into coherent chunks
(1:23:10) Generate question-answer pairs from text chunks
(1:38:48) Embed text chunks and questions
(2:17:06) Statistical tests and metrics
(3:12:01) Expanding the dataset and adding more languages
(3:45:
|
In this Python FastAPI tutorial, we'll b...
In this Python FastAPI tutorial, we'll b...
Learn how to benchmark embedding models ...
In this Python FastAPI tutorial, we'll b...
In this series of videos, we'll be learn...
Download your free Python Cheat Sheet he...
There's always more to learn in the tech...
Download your free Python Cheat Sheet he...
Did you know that you can insert list it...
Download your free Python Cheat Sheet he...
Watch as Craig Labenz does something fun...
Download your free Python Cheat Sheet he...
Today Quincy Larson interviews Zubin Pra...
What's the difference between call vs ap...